Skilled Deep Research β Post-Mortem
Written: 2026-03-07 | Run: cmmc-templates (CMMC 2.0 templates for small business)
TL;DR
The run produced 5 sources from 5 workers in ~39 minutes. That's 1 source per worker. A properly functioning run should produce 10-20 sources per worker (50-100 total). Actual yield: ~5-10% of expected. The skill architecture is sound but has multiple critical bugs.
What Actually Happened (Timeline)
| Time (UTC) | Event |
|---|---|
| 05:30 | Ada spawned orchestrator |
| 05:30β05:35 | Orchestrator spawned 5-6 workers in parallel |
| 05:30β05:45 | Workers each fetched 1-2 URLs then went silent |
| 06:09 | Orchestrator declared "complete", merged 5 sources |
| 06:09 | Report written β only 1 source per worker |
Bug #1 β CRITICAL: Workers can't spawn (agentId missing)
What happened: The resume orchestrator failed immediately with:
"error": "ACP target agent is not configured. Pass `agentId` in `sessions_spawn` or set `acp.defaultAgent` in config."
Root cause: The worker prompt template in SKILL.md calls sessions_spawn without an agentId. Sub-agents (depth β₯ 1) can't inherit the default agent from config β they need it explicitly passed.
Evidence: The resume orchestrator (c17c0e28) died after 8 lines β spawned, tried to spawn workers, got the ACP error, stopped.
This is almost certainly why original workers also failed β if the orchestrator spawned workers using the same broken prompt, all workers would fail to spawn. The 5 results we got may have been from the orchestrator itself fetching URLs directly (hallucinating worker output), or from a version of the prompt that omitted the spawn call and fetched inline.
Fix: Add agentId to every sessions_spawn call in orchestrator and worker prompts:
sessions_spawn({
agentId: "ada", // β REQUIRED for sub-agents
task: "...",
runtime: "subagent",
...
})
The agentId needs to be threaded from Ada β orchestrator prompt β worker prompts at spawn time.
Bug #2 β CRITICAL: known-urls.txt not being updated
What happened: After 5 workers fetched multiple URLs, known-urls.txt contained exactly 1 URL.
Expected: Every worker should append fetched URLs to known-urls.txt for deduplication.
Root cause: The worker prompt says to "Append URL to known-urls.txt" but doesn't give the exact file path or exec command. Workers are inconsistent about whether they do this step. Also: if workers are actually running in-process in the orchestrator (due to Bug #1), they may not have write access to the right path.
Impact: No deduplication across workers. Workers could fetch the same URLs. Retry logic is also broken since known-urls.txt is the source of truth.
Fix: Give workers an explicit shell command:
echo "https://fetched-url.com" >> /home/sean/.openclaw/workspace-ada/skills-data/skilled-deep-research/[SLUG]/known-urls.txt
And verify the file exists and is writable at worker startup.
Bug #3 β HIGH: Community worker result data loss
What happened: community worker progress.json showed:
- urls_fetched: 9
- findings: 9
But community-results.md contained only 1 source block.
Root cause: Workers are supposed to checkpoint (append to results.md) after every URL. The community worker either: 1. Buffered all results in memory and wrote at the end (crashed before writing), or 2. Overwrote instead of appended on a second pass
Fix: Enforce append-only writes in the worker prompt with explicit shell:
cat >> results.md << 'BLOCK'
### [score] [title](url)
...
---
BLOCK
Never write the whole file at once. Checkpoint after every single URL fetch.
Bug #4 β HIGH: gov worker found 0 URLs
What happened: gov worker progress showed urls_found: 0, urls_fetched: 0. It was stuck on https://csrc.nist.gov/pubs/sp/800/171/a/final with no results.
Root cause: IPv6 blocking on .gov sites. The SKILL.md documents this explicitly:
"Our LXC uses IPv6 which Akamai CDNs can block on .gov/.mil sites. Never use raw web_fetch or curl without -4 on government sites."
The worker prompt tells workers to use the fetch script (which forces -4) but web_search results don't auto-use it β the worker has to consciously call the fetch script for every URL. If a worker instead used web_fetch directly (the native tool), .gov fetches silently fail or return bot-block pages.
Evidence: gov worker shows urls_found: 0 β meaning even the search returned nothing actionable, or the worker couldn't parse the results before stalling.
Fix: 1. Add explicit validation step at worker start: verify the fetch script exists and returns 200 for a test URL 2. Add to worker prompt: "DO NOT use the web_fetch tool for any URL. ONLY use the fetch script." 3. Consider a pre-flight search test before committing to the URL list
Bug #5 β MEDIUM: Binary file fetch (UnicodeDecodeError)
What happened: retry-queue.md contained:
- https://media.armis.com/raw/upload/cmmc-rfp-template.docx β reason: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0
Root cause: The fetch script returns raw binary for .docx/.pdf files. Workers try to decode as UTF-8 text, fail, and log to retry queue. The retry worker would have the same problem.
Fix: Detect binary content-type before fetching full body. If Content-Type is application/vnd.openxmlformats or application/pdf, just log the direct download URL β don't try to read the content. The existence of a direct download link is the finding, not the file contents.
Bug #6 β MEDIUM: Orchestrator declared complete too fast
What happened: The orchestrator merged results at 06:09 β only ~39 minutes after workers spawned. The workers were showing phase: fetching in their progress files at that point. The orchestrator didn't wait for completion signals.
Root cause: A2A signaling (workers β orchestrator) depends on sessions_send. If workers are broken (Bug #1), they never send WORKER_COMPLETE signals. The orchestrator's fallback is to poll progress files every 120s for up to 15 cycles (30 minutes max). After 15 cycles with no progress, it moves on β even if workers are stalled mid-fetch.
Impact: Orchestrator synthesized partial results and declared success.
Fix: Add a minimum threshold check before synthesis: "If fewer than 3 workers sent WORKER_COMPLETE and total findings < 10, do NOT synthesize β log a failure and alert Ada instead."
Bug #7 β LOW: merge-reports.py parses results with regex, brittle
What happened: The report's source quality scores are all listed as [2/5] despite the underlying worker results showing [5/5] for the NIST templates. The merge script likely failed to parse the format correctly.
Root cause: merge-reports.py uses regex on the results markdown. Any formatting deviation (missing blank line, slightly different header) causes the parser to drop or misparse a source.
Fix: Switch to a more forgiving parser, or enforce results format with a schema validator workers run before writing. At minimum, add a format-check step to the worker prompt.
What Actually Worked
- Skill architecture β orchestrator β workers β A2A signaling is the right design
- Fetch script β when used correctly (fetch script vs raw web_fetch), it works
- Checkpointing concept β progress.json files gave us enough telemetry to diagnose the failures
- Deduplication concept β known-urls.txt is the right approach, just not implemented correctly
- Search quality β the URLs workers found were relevant (cmmcaudit.org, GitHub, NIST) β the search step worked
Missing Capability: Site Crawling
Not a bug β a genuine missing feature. Current skill: - Fetches individual URLs surfaced by search queries - Does NOT follow links or traverse site structure
For deep research, this is a significant gap. Sites like cmmcaudit.org have a resource index page that links to 8+ template pages. Search engines only surface 1-2 of those. Without crawling, we miss 75%+ of available resources on resource-rich sites.
Proposed solution: A crawl.py helper script:
# Given a root URL + relevance keywords, extract and score internal links
# Return top N links sorted by relevance score (anchor text match)
# Respect: depth limit (2), domain boundary, already-known URLs
Workers call this when they land on a page that looks like a resource index (template, download, tools pages).
Priority Fix List
| Priority | Bug | Effort | Impact |
|---|---|---|---|
| π΄ P0 | Bug #1: agentId missing from sessions_spawn | Low β add one field | Workers can't spawn at all |
| π΄ P0 | Bug #3: Result data loss (buffer vs append) | Low β change write pattern | 80%+ of findings lost |
| π΄ P0 | Bug #4: gov worker IPv6 block | Low β strengthen prompt | .gov sources completely inaccessible |
| π P1 | Bug #2: known-urls.txt not updated | Low β add explicit command | Dedup broken, retry logic broken |
| π P1 | Bug #6: Orchestrator declares complete too fast | Medium β add threshold check | False "success" on failed runs |
| π‘ P2 | Bug #5: Binary file UnicodeDecodeError | Low β content-type check | Direct download links missed |
| π‘ P2 | Bug #7: merge-reports.py brittle parser | Medium β improve parser | Source scores wrong in final report |
| π’ P3 | Missing: Site crawling | High β new script + prompt changes | 10x more sources on resource-rich sites |
Recommended Fix Order
- Fix agentId (P0) β without this, nothing works
- Fix append-only results writing (P0) β without this, findings are lost
- Fix IPv6 / fetch script enforcement (P0) β .gov sources are highest quality
- Fix known-urls.txt update (P1) β enables proper dedup and retry
- Fix orchestrator completion threshold (P1) β prevents false success
- Fix binary file handling (P2)
- Fix merge parser (P2)
- Build crawl capability (P3)